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Abstract. We analyse the computational complexity of finding Nash 
equilibria in simple stochastic multiplayer games. We show that restricting 
the search space to equilibria whose payoffs fall into a certain interval may 
lead to undecidability. In particular, we prove that the following problem is 
undecidable: Given a game Q, does there exist a pure-strategy Nash equilib- 
rium of Q where player wins with probability 1. Moreover, this problem 
remains undecidable if it is restricted to strategies with (unbounded) finite 
memory. However, if mixed strategies are allowed, decidability remains 
an open problem. One way to obtain a provably decidable variant of the 
problem is to restrict the strategies to be positional or stationary. For the 
complexity of these two problems, we obtain a common lower bound of NP 
and upper bounds of NP and PSpace respectively. 

1 Introduction 



We study stochastic games (TB) played by multiple players on a finite, directed 

graph. Intuitively, a play of such a game evolves by moving a token along 
edges of the graph: Each vertex of the graph is either controlled by one of 
the players, or it is a stochastic vertex. Whenever the token arrives at a non- 
stochastic vertex, the player who controls this vertex must move the token 
to a successor vertex; when the token arrives at a stochastic vertex, a fixed 
probabihty distribution determines the next vertex. The play ends when it 
reaches a terminal vertex, in which case each player receives a payoff. In the 
simplest case, which we discuss here, the possible payoffs of a single play 
are just and 1 (i.e. each player either wins or loses a given play). However, 
due to the presence of stochastic vertices, a player's expected payoff (i.e. her 
probability of winning) can be an arbitrary probability. 

Stochastic games have been successfully applied in the verification and 
synthesis of reactive systems imder the influence of random events. Such a 
system is usually modelled as a game betweeri the system and its environment, 
where the environment's objective is the complement of the system's objective: 
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the environment is considered hostile. Therefore, traditionally, the research 
in this area has concentrated on two-player games where each play is won by 
precisely one of the two players, so-called two-player, zero-sum games. However, 
the system may comprise of several components with independent objectives, 
a situation which is naturally modelled by a multiplayer game. 

The most common interpretation of rational behaviour in multiplayer 
games is captured by the notion of a Nash equilibrium [17J. In a Nash equilib- 
rium, no player can improve her payoff by unilaterally switching to a different 
strategy. Chatterjee & al. ||6| showed that any simple stochastic multiplayer 
game has a Nash equilibrium, and they also gave an algorithm for computing 
one. We argue that this is not satisfactory. Indeed, it can be shown that their 
algorithm may compute an equilibrium where all players lose almost surely 
(i.e. receive expected payoff 0), while there exist other equilibria where all 
players win almost surely (i.e. receive expected payoff 1). 

In applications, one might look for an equilibrium where as many players 
as possible win almost surely or where it is guaranteed that the expected 
payoff of the equilibrium falls into a certain interval. Formulated as a decision 
problem, we want to know, given a A:-player game Q with initial vertex cq arid 
two thresholds x,y e [0,1]*^, whether {G,'Vq) has a Nash equilibrium with 
expected payoff at least x and at most y. This problem, which we call NE for 
short, is a generalisation of Condon's SSG Problem [|8j| asking whether in a 
two-player, zero-sum game one of the two players, say player 0, has a strategy 
to win the game with probability at least \ . 

The problem NE comes in several variants, depending on the type of 
strategies one considers: On the one hand, strategies may be mixed (allowing 
randomisation over actions) or pure (not allowing such randomisation). On 
the other hand, one can restrict to strategies that use (imbounded or bounded) 
finite memory or even to stationary ones (strategies that do not use any 
memory at all). For the SSG Problem, this distinction is not meaningful 
since in a two-player, zero-sum simple stochastic game both players have 
an optimal positional (i.e. both pure and stationary) strategy [8|. However, 
regarding NE this distinction leads to distinct decision problems, which have 
to be analysed separately. 

Our main result is that NE is undecidable if only pure strategies are 
considered. In fact, even the following, presumably simpler, problem is 
imdecidable: Given a game Q, decide whether there exists a pure Nash 
equilibrium where player wins almost surely. Moreover, the problem 
remains undecidable if one restricts to pure strategies that use (unbounded) 
finite memory. However, for the general case of arbitrary mixed strategies, 
decidability remains an open problem. 

If one restricts to simpler types of strategies like stationary ones, the 
problem becomes provably decidable. In particular, for positional strategies 
the problem becomes NP-complete, and for arbitrary stationary strategies the 
problem is NP-hard but contained in PSpace. We also relate the complexity of 
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the latter problem to the complexity of the infamous Square Root Sum Problem 
(SqrtSum) by providing a polynomial-time reduction from SqrtSum to NE 
with the restriction to stationary strategies. It is a long-standing open problem 
whether SqrtSum falls into the polynomial hierarchy; hence, showing that 
NE for stationary strategies lies inside the polynomial hierarchy would imply 
a breakthrough in complexity theory. 

Let us remark that our game model is rather restrictive: Firstly, players 
receive a payoff only at terminal vertices. In the literature, a plethora of game 
models with more complicated modes of winning have been discussed. In 
particular, the model of a stochastic parity game fS', '24] has been investigated 
thoroughly. Secondly, our model is turn-based (i.e. for every non-stochastic ver- 
tex there is only one player who controls this vertex) as opposed to concurrent 
III2I [TTI . The reason that we have chosen to analyse such a restrictive model is 
that we are focussing on negative results. Indeed, all our lower bounds hold 
for (multiplayer versions of) the aforementioned models. Moreover, besides 
Nash equilibria, our negative results apply to several other solution concepts 
like subgame perfect equilibria |j2ni22| and secure equilibria [4J. 

For games with rewards on transitions |15|, the situation might be differ- 
ent: While our lower bounds can be applied to games with rewards under 
the average reward or the total expected reward criterion, we leave it as an open 
question whether this remains true in the case of discounted rewards. 

Related Work. Determining the complexity of Nash Equilibria has attracted 
much interest in recent years. In particular, a series of papers culminated 
in the result that computing a Nash equilibrium of a two-player game in 
strategic form is complete for the complexity class PPAD [10, .7J. More in 
the spirit of our work, Conitzer and Sandholm [9J showed that deciding 
whether there exists a Nash equilibrium in a two-player game in strategic 
form where player receives payoff at least x and related decision problems 
are all NP-hard. For infinite games (without stochastic vertices), (a qualitative 
version of) the problem NE was studied in [23 1. In particular, it was shown 
that the problem is NP-complete for games with parity winning conditions 
and even in P for games with Biichi winning conditions. 

For stochastic games, most results concern the classical SSG problem: 
Condon showed that the problem is in NP n co-NP ['51, but it is not known 
to be in P. We are only aware of two results that are closely related to our 
problem: Firstly, Etessami & al. fVi \ investigated Markov decision processes 
with, e.g., multiple reachability objectives. Such a system can be viewed as a 
stochastic multiplayer game where all non-stochastic vertices are controlled 
by one single player. Under this interpretation, one of their results states that 
NE is decidable in pol5momial time for such games. Secondly, Chatterjee & 
al. ||6l showed that the problem of deciding whether a (concurrent) stochastic 
game with reachability objectives has a positional-strategy Nash equilibrium 
with payoff at least x is NP-complete. We sharpen their hardness result by 
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showing that the problem remains NP-hard when it is restricted to games with 
only three players (as opposed to an unbounded number of players) where, 
additionally, payoffs are assigned at terminal vertices only (cf . Theorem 5 and 
the subsequent remark). 



2 Simple stochastic multiplayer games 

The model of a (two-player, zero-sum) simple stochastic game, introduced by Con- 
don Us), easily generalises to the multiplayer case: Formally, we define a simple 
stochastic multiplayer game (SSMG) as a tuple G = {n,V, {Vi)ji=n,A, (F,)/gj7) 
such that: 

• n is a finite set of players (usually 17 = {0, 1, ... ,A: — 1}); 

• y is a finite set of vertices; 

» ViQV and V; DVj = (D for each i E H; 

• AC V X ([0, 1] U {_L}) X y is the transition relation; 

• f , C y for each i e n. 

We call a vertex v e V,- controlled by player i and a vertex that is not contained 
in any of the sets V,- a stochastic vertex. We require that a transition is labelled 
by a probability iff it originates in a stochastic vertex: If (c, p,iv) E A then 
p E [0, 1] if c is a stochastic vertex and p = _L if c G 1^- for some i E 17. 
Moreover, for each pair of a stochastic vertex v and an arbitrary vertex zv, we 
require that there exists precisely one p E [0, 1] such that (c, p, iv) E A. For 
computational purposes, we require additionally that all these probabilities 
are rational. 

For a given vertex v E V, we denote the set of all w; G V such that there 
exists p E (0, 1] U {_L} with {v,p,w) E Ahy vA. For technical reasons, we 
require that vA ^ Q) for all v eV . Moreover, for each stochastic vertex v, 
the outgoing probabilities must sum up to 1: Yj{j),w):{v,]),w)!^aV — 1- Finally, 
we require that each vertex v that lies in one of the sets F, is a terminal (sink) 
vertex: vA = {v}. So if F is the set of all terminal vertices, then F, C F for 
each i E 77. 

A (mixed) strategy of player i in C/ is a mapping cr : V*Vi T^{V) assigning 
to each possible history xv E V* Vj of vertices ending in a vertex controlled by 
player / a (discrete) probability distribution over V such that cr(xv){w) > 
only if {v,±,w) E A. Instead of cr[xv){w), we usually write cr{w \ xv). A 
(mixed) strategy profile of Q is a tuple a = {o'dien where cr,- is a strategy of 
player i in Q. Given a strategy profile a = {o'j)j£n ^nd a strategy t of player i, 
we denote by ((/_;, t) the strategy profile resulting from a by replacing Cj 
with T. 

A strategy cr of player i is called pure if for each xv E V* Vj there exists 
zv E vA with (7{iu I xv) = 1. Note that a pure strategy of player i can be 
identified with a function cr : V*Vj — > V. A strategy profile a = (u;)/gj7 is 
called pure if each (j, is pure. 
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A strategy cr of player i in Q is called stationary if a depends only on the 
current vertex: a{xv) = cr{v) for all xv e V*Vi. Hence, a stationary strategy 
of player i can be identified with a function u : V, ^ D(y). A strategy profile 
^ = {^i)ien of G is called stationary if each (7; is stationary. 

We call a pure, stationary strategy a positional strategy and a strategy pro- 
file consisting of positional strategies only a positional strategy profile. Clearly, 
a positional strategy of player i can be identified with a function (7 : V, — > V. 
More generally, a pure strategy a is called /i'nzfe-stofe if it can be implemented 
by a finite automaton with output or, equivalently, if the equivalence relation 
~ C y* X y* defined by x ~ y if (j(xz) = cr{yz) for all z e V*Vi has only 
finitely many equivalence classesj^ Finally, a finite-state strategy profile is a 
profile consisting of finite-state strategies only. 

It is sometimes convenient to designate an initial vertex Vq E V ot the 
game. We call the tuple {G,vq) an initialised SSMG. A strategy (strategy 
profile) of (t/,co) is just a strategy (strategy profile) of C/. In the following, we 
will use the abbreviation SSMG also for initialised SSMGs. It should always 
be clear from the context if the game is initialised or not. 

Given an SSMG (t/, cq) and a strategy profile a = {o'j)i^n> the conditional 
probability of w E V given the history xv E V*V is the number ai{w \ xv) if 
V e Vi and the unique p e [0, 1] such that {v,p,w) e Ait v is a stochastic 
vertex. We abuse notation and denote this probability by ^{w \ xv). The 
probabilities cr{w \ xv) induce a probability measure on the space in 
the following way: The probability of a basic open set Vi . . .V/^ ■ is if 
Vi 7^ vq and the product of the probabilities a{vj \ vi . . . Vj^i) for ; = 2, . . . , A: 
otherwise. It is a classical result of measure theory that this extends to a 
unique probability measure assigning a probability to every Borel subset of 
V^, which we denote by Pr^^ . 

For a set (J C V, let Reach(!J) := V* ■ U ■ V^. We are mainly interested 
in the probabilities p, := Pr^^^ ( Reach (F,)) of reaching the sets F;. We call p; 
the (expected) payoff of a for player i and the vector {pi)j^n the (expected) payoff 
of a. Another way to define these probabilities is via the Markov chain Q"^ 
which is defined as follows: The state set of is (the set of all nonempty 
sequences of vertices), and the probability of going from state xv to state xvw 
(x e V* , v,w E V) is equal to ^{w \ xv). Then the expected payoff of a for 
player i can be computed as the probability of reaching a state xv with v G F, 
from state cq ir* Q'^ ■ 

Drawing an SSMG. When drawing an SSMG as a graph, we will use the 
following conventions: The initial vertex is marked by an incoming edge that 
has no source vertex. Vertices that are controlled by a player are depicted 
as circles, where the player who controls a vertex is given by the label 
next to it. Stochastic vertices are depicted as diamonds, where the transition 

^In general, this definition is applicable to mixed strategies as well, but for this paper we 
will identify finite-state strategies with pure finite-state strategies. 
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probabilities are given by the labels on its outgoing edges (the default being j). 
Finally, terminal vertices are generally represented by their associated payoff 
vector. In fact, we allow arbitrary vectors of rational probabilities as payoffs. 
This does not increase the power of the model since such a payoff vector can 
easily be realised by an SSMG consisting of stochastic and terminal vertices 
only. 

3 Nash equilibria 

To capture rational behaviour of (selfish) players, John Nash fVT] introduced 
the notion of, what is now called, a Nash equilibrium. Formally, given a strategy 
profile a, a strategy t of player i is called a best response to a if t maximises the 
expected payoff of player f: PrJ-"''''(Reach(F,)) < Pr-J-"''^(Reach(F,)) for all 
strategies t' of player i. A Nash equilibrium is a strategy profile a = {o'i)i^Yl 
such that each cr, is a best response to a. Hence, in a Nash equilibrium 
no player can improve her payoff by (unilaterally) switching to a different 
strategy. 

Previous research on algorithms for finding Nash equilibria in infinite 
games has focused on computing some Nash equilibrium Q. However, a 
game may have several Nash equilibria with different payoffs, and one might 
not be interested in any Nash equilibrium but in one whose payoff fulfils 
certain requirements. For example, one might look for a Nash equilibrium 
where certain players win almost surely while certain others lose almost 
surely. This idea leads us to the following decision problem, which we call 

neE 

Given an SSMG {Q,vq) and thresholds x, y e [0, 1]^, decide whether 
there exists a Nash equilibrium of {Q,Vq) with payoff > x and < y. 

For computational purposes, we assume that the thresholds x and y are 
vectors of rational numbers. A variant of the problem which omits the 
thresholds just asks about a Nash equilibrium where some distinguished 
player, say player 0, wins with probability 1: 

Given an SSMG {Q,vq), decide whether there exists a Nash equilib- 
rium of {Q,vq) where player wins almost surely. 

Clearly, every instance of the threshold-free variant can easily be turned 
into an instance of NE (by adding the thresholds x = (1, 0, . . . , 0) and y — 
(!,...,!)). Hence, NE is, a priori, more general than its threshold-free variant. 

Our main concern in this paper are variants of NE where we restrict 
the type of strategies that are allowed in the definition of the problem: Let 
PureNE, FinNE, StatNE and PosNE be the problems that arise from NE by 
restricting the desired Nash equilibrium to consist of pure strategies, finite- 
state strategies, stationary strategies and positional strategies, respectively. 

^In the definition of NE, the ordering < is applied componentwise. 
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In the rest of this paper, we are going to prove upper and lower bounds 
on the complexity of these problems, where all lower bounds hold for the 
threshold-free variants, too. 

Our first observation is that neither stationary nor pure strategies are 
sufficient to implement any Nash equilibrium, even if we are only interested 
in whether a player wins or loses almost surely in the Nash equilibrium. To- 



gether with a result from Section 5 (namely Proposition 10 1, this demonstrates 
that the problems NE, PureNE, FinNE, StatNE, and PosNE are pairwise 
distinct problems, which have to be analysed separately. 

Proposition 1. There exists an SSMG that has a finite-state Nash equilibrium 
where player wins almost surely but that has no stationary Nash equilibrium 
where player wins with positive probability. 



Proof. Consider the game Q depicted in Figure 1 played by three players 0, 
1 and 2 (with payoffs in this order). Obviously, the following finite-state 
strategy profile is a Nash equilibrium where player wins almost surely: 
Player 1 plays from vertex V2 to vertex at the first visit of V2 but leaves 
the game immediately (by playing to the neighbouring terminal vertex) at all 
subsequent visits to V2; from vertex vq player 1 plays to vi; player 2 plays from 
vertex ^3 to vertex 174 at the first visit of but leaves the game immediately 
at all subsequent visits to C3; from vertex vi player 2 plays to 1^2 • 



(0,0,0) 




(1,1,0) 



(0,1,0) (0,0,1) (1,0,1 



Figure 1. An SSMG with three players 



It remains to show that there is no stationary Nash equilibrium of {Q, vq) 
where player wins with positive probability. Any stationary Nash equi- 
librium of (^/, Co) where player wins with positive probability induces a 
stationary Nash equilibrium of {Q,V2) where both players 1 and 2 receive pay- 
off at least j since otherwise one of these players could improve her payoff by 
changing her strategy at vq or vi. Hence, it suffices to show that {Q,V2) has no 
stationary Nash equilibrium where both players 1 and 2 receive payoff at least 
2 . Assume there exists such an equilibrium and denote by p the probability 
that player 2 plays from ^3 to C4. Since player 1 wins with probability > 0, it 
must be the case that p > 0. But then, to have a Nash equilibrium, player 1 
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must play from to C3 with probability 1, giving player 2 a payoff of 0, a 
contradiction. q.e.d. 



Proposition 2. There exists an SSMG that has a stationary Nash equilibrium 
where player wins almost surely but that has no pure Nash equilibrium 
where player wins with positive probability. 



Proof. Consider the game depicted in Figure 2] played by three players 0, 1 
and 2 (with payoffs given in this order). Clearly, the stationary strategy profile 
where from vertex V2 player selects both outgoing edges with probability ^ 
each, player 1 plays from vq to Vi and player 2 plays from to V2 is a 
Nash equilibrium where player wins almost surely. However, for any pure 
strategy profile where player wins almost surely, either player 1 or player 2 
receives payoff and could improve her payoff by switching her strategy at 
Vq or cj respectively. q.e.d. 

(1,0,1) 



(1,1,0) 



(0,1,0) (0,0,1) 

Figure 2. Another SSMG with three players 




4 Decidable variants of NE 

4.1 Upper bounds 

In this section, we show that the problems PosNE and StatNE are contained 
in the complexity classes NP and PSpace respectively. 

Theorems. PosNE is in NP. 

Proof. Let {Q,vq) be an SSMG. Any positional strategy profile of Q can be 
identified with a mapping a : [Jien^i ~^ ^ such that {v, ±,(7{v)) G A for 
each non-stochastic vertex v, an object whose size is linear in the size of Q. To 
prove that PosNE is in NP, it suffices to show that we can check in polynomial 
time whether such a mapping a constitutes a Nash equilibrium whose payoff 
lies in between the given thresholds x and y. 

First, we need to compute the payoff of cr. Let := Pr^ ( Reach (F,)) 
denote the expected payoff of a for player i in {Q, v), and let z' = {z\j)j,^v It 
is a well-known result of the theory of Markov chains that z' is the optimal 
solution of the following linear programme: 

Minunise I^^gv z^, subject to: 

z[ > forv eV, 
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= 1 for V E Fi, 

= I] ^{'^ I 'v) ■ Ac for 57 e y \ F,'. 

Once we have computed z', we can check whether x, < z^^^ < y,; this 
inequality holds for each player i G Tl iff the payoff of a lies in between x 
and y. 

To check whether is a Nash equilibrium, we need to compute the 
numbers sup^ Pri^ -"■'^ (Reach(F,' ) ) (where T ranges over every strategy of 
player ; in C/), the maximal payoff that player i can achieve when playing 
against o'_[. If this payoff is equal to z\^, then she cannot gain anything 
by imilaterally switching to any other strategy. From the theory of Markov 
decision process (cf. |19|), it is well-known that the desired payoff can be 
computed by the following linear programme over the variables f' = (ry)i,gy: 



Minimise Eoev subject to: 
r: 
r 



> for z; e y, 
= 1 for c e Fj, 

> r[f, for V eV; and w e vA, 
= ^ o'{w I v) ■ r\f, for c e y \ Vi. 



To check whether a \s a Nash equilibrium, it suffices to compute for each 
player i the optimal solution and to check whether = z[,^. 

Since linear programmes can be solved in polynomial time and both 
programmes are of size polynomial in the size of the game, all these checks 
can be carried out in pol5momial time. q.e.d. 

To prove the decidability of StatNE, we appeal to results established for 
the Existential Theory of the Reals, ExTh(5H), the set of all existential first-order 
sentences (over the appropriate signature) that hold in $K := (R, +, ,0, 1, <). 
The best known upper bound for the complexity of the associated decision 
problem is PSpace ||3j |20J, which leads to the following theorem. 

Theorem 4. StatNE is in PSpace. 

Proof. Instead of giving a deterministic polynomial-space algorithm for 
StatNE, we give a nondeterministic one. Since PSpace = NPSpace, this 
implies that StatNE is in PSpace. On input Q, vq, x, y, the algorithm starts by 
guessing a set S C V x V and proceeds by computing, for each player i, the set 
Ri of vertices from where the set f, is reachable in the graph G = (V, S), a com- 
putation which can be carried out in polynomial time. Note that if S is the sup- 
port of a stationary strategy profile a, i.e. S = { (u , if) e V x V : a{w \ v) > 0}, 
then Rj is precisely the set of vertices v such that Prp(Reach(Fj)) > 0. Finally, 
the algorithm evaluates an existential first-order sentence ip, which can be 
computed in polynomial time from {G,Vq), x, y, S and (-R,);gn/ over $K and 
returns the answer to this query. 
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It remains to describe a suitable sentence ip. Let a — {avw)v,w€V' f — 
{'r\,)ien,vev and z — {Zy)ii^n,vev be three sets of variables, and let — 
Uien Vi be the set of all non-stochastic vertices. The formula 




vev\Vt {v,w)eS {v,w)^s 



where pxjw is the imique nimiber such that {v, pv^, w) G A, states that the 
mapping a : V ^ ^(^) defined by a{w \ v) — oc-pw constitutes a valid 
stationary strategy profile of Q whose support is S. Provided that ^(a) holds 
in 91, the formula 

7i(a,z) A4 = 1 A A 4 = 0A A 4 = E ^f"'^ 

oeF; vev\Ri vev\Fi ^evA 

states that = Pr^ ( Reach (f,)) for each v G V, where cr is defined as above. 
Again, this follows from a well-known results about Markov chains, namely 
that the vector of the aforementioned probabilities is the imique solution to 
the given system of equations. Finally, the formula 

M^,f) := A 4 > A A 4 = 1 A A 4 > 4 A A 4 = E ^^"'^ 
veV veFi veVj veV\Vi wevA 

wevA 

states that r is a solution of the linear programme for computing the maximal 
payoff that player i can achieve when playing against the strategy profile 
In particular, the formula is fulfilled if r'y — sup^Pr^^ "^^(Reach(F;)) (where 
T ranges over every strategy of player i), and every other solution is greater 
than this one (in each component). 

The desired sentence jp is the existential closure of the conjimction of 
f and, for each player i, the formulae and i?; combined with formulae 
stating that player i cannot improve her payoff and that the expected payoff 
for player i lies in between the given thresholds: 

ip:^ 3cc3f3z{(p{a.) A /\{rii{ci,z) Adi{K,f) Ari^ < z^^ A z|,^ < yO) 
ien 

It follows that ip holds in d\ iff {Q,Vo) has a stationary Nash equilibrium a 
with payoff at least x and at most y whose support is S. Consequently, the 
algorithm is correct. q.e.d. 

4.2 Lower bounds 

Having shown that PosNE and StatNE are in NP and PSpace respectively, 
the natural question arises whether there is a polynomial-time algorithm 
for PosNE or StatNE. The following theorem shows that this is not the case 
(urvless, of covirse, P — NP) since both problems are NP-hard. Moreover, both 
problems are already NP-hard for games with only two players. 
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Theorem 5. PosNE and StatNE are NP-hard, even for games with only two 
players. 

Proof. The proof is by reduction from SAT. Let (p = Ci A • • • A Cm be a formula 
in conjunctive normal form over propositional variables Xi, . . . ,X„. Our aim 
is to construct a two-player SSMG (C/<p, I'o) such that the following statements 
are equivalent: 

1. (p is satisfiable; 

2. {G(p,vo) has a positional Nash equilibrium with payoff (1, j); 

3. {Q(p,vq) has a stationary Nash equilibrium with payoff (1, j). 

Provided that the game can be constructed in polynomial time, the equiva- 
lence of[T] and|2] establishes a polynomial-time reduction from SAT to PosNE, 
whereas the equivalence of[T] and|3] establishes one from SAT to StatNE. The 
game Q(p is depicted in [Figure 3| and played by players and 1. The game 
proceeds from the initial vertex vq to X, or X, with probability each, and 
there is an edge from vertex Cy to vertex X,- or X, iff X, or ^X, respectively 
occurs in the clause Cj. Also, from T-labelled vertices player 1 can "leave the 
game" by moving to a terminal vertex with payoff (0, 1). Obviously, the game 
Qcp can be constructed from cp in polynomial time. It remains to show that 
[l]-[3] are equivalent. 

(1. ^ 2.) Assume that a : {Xi, . . .,X„} {true, false} is a satisfying 
assignment of q). In the positional Nash equilibrium of {G,vo), player moves 
from a literal L (i.e. L = X, or L = X, for some i — 1, . . . , n) to the T-labeUed 
vertex iff L is mapped to true by a, and player 1 moves from vertex Cj to a 
(fixed) literal L that is contained in Cy and mapped to true by a (which is 
possible since a is a satisfying assignment). At T-labelled vertices, player 1 
never leaves the game. Obviously, player wins almost surely with this 
strategy profile. For player 1, the payoff is 

1 f-J__J_ l/'V^l^_J_ lf1_J_^-l 

where the first summand is the probability of going from the initial vertex to 
cp, from where player 1 wins almost surely since from every clause vertex she 
plays to a "true" literal. Obviously, changing her strategy cannot give her a 
better payoff. Therefore, we have a Nash equilibrium. 
(2. ^ 3.) Obvious. 

(3. 1.) Let a = (tTg, Uj) be a stationary Nash equilibriimi of {G(p,V()) 
with payoff (1, j). Our first aim is to show that uq is actually a positional 
strategy. Towards a contradiction, assume that there exists a literal L such 
that t7o(L) assigns probability < < 1 to the neighbouring T-labelled 
vertex. Since player wins almost surely, player 1 never leaves the game. 
Hence, the expected payoff for player 1 from vertex L (i.e. in the game {Q(p, L)) 
is precisely cj. However, if she left the game at the T-labelled vertex, she 
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1 




woiild receive payoff ^ > q. Therefore, a is not a Nash equilibrium, a 
contradiction. 

Knowing that <Tq is a positional strategy, we can define a pseudo assignment 
Oi : {Xi, ^Xi, . . . , X„, -'X„} — > {true, false} by setting (x{L) — true if a-i pre- 
scribes to go from vertex L to the neighbouring T-labelled vertex. Our next 
aim is to show that a is actually an assignment: a(X;) — true <^ a:(-iX,) — 
false. To see this, note that we can compute player I's expected payoff as 
follows: 



= P I 

on+l i-J yi+l' 



if a(X,) = a(-.X,) = false, 
fl; = < 1 if cc{Xi) oci^Xi), 

2 if cc{Xi) = aC-iX,) = true, 

where p is the expected payoff for player 1 from vertex (p. By the constiuction 



1=1 
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of Gf, we have p > 0, and the equality only holds if p = 1 and a, = 1 for all 
i = 1, . . . ,n, which proves that a is an assignment. 

Finally, we claim that a is a satisfying assignment. If this were not the 
case, there would exist a clause C such that player I's expected payoff from 
vertex C is and therefore p < 1, where p is defined as above. This is a 
contradiction to the fact that p = 1, as we have shown above. q.e.d. 



Remark. The reduction in the proof of Theorem 5 can be modified to demon- 



strate NP-hardness of the threshold-free variants of PosNE and StatNE, albeit 
at the expense of adding one more player to the game. 

It follows from Theorems|3]and|5]that PosNE is NP-complete. For StatNE, 
we have provided an NP lower boimd and a PSpace upper bound, but the 
exact complexity of the problem remains imclear. Towards gaining more 
insight into the problem StatNE, we relate its complexity to the complexity 
of the Square Root Sum Problem (SqrtSum), the problem of deciding, given 
numbers d-[, . . . ,dn,k E N, whether Y^"^-[ > k. Recently, it was shown that 
SqrtSum belongs to the 4th level of the counting hierarchy [ll, which is a slight 
improvement over the previously known PSpace upper bound. However, it is 
an open question since the 1970s whether SqrtSum falls into the polynomial 
hierarchy |fT6l lT4ll . We identify a polynomial-time reduction from SqrtSum 
to StatNpg Hence, StatNE is at least as hard as SqrtSum, and showing 
that StatNE resides inside the polynomial hierarchy would imply a major 
breakthrough in understanding the complexity of numerical computation. 

Theorem 6. SqrtSum is pol5momial-time reducible to StatNE. 

Proof. Given an instance {di, . . . ,d„,k) of SqrtSum, we construct an 
SSMG {G,vo) played by players 0,1,2,3 (with payoffs given in this order) 
such that I^/Li ^/di > k iff {Q,vq) has a stationary Nash equilibrium where 
player wins almost surely. 

In order to state our reduction, let us first examine the game G{p), where 



p G [j / 1)/ which is depicted in Figure 4 (b) 



Claim 7. The maximal payoff player 3 can receive in a stationary Nash equi- 
librium of {G{p),s) is ^^2^+2^"*"^ "- 

Proof. Let a be any stationary strategy profile of {G{p),s). We denote by xi 
and X2 the probabilities that player stays inside the gadget at vertex si and 
vertex S2 respectively. Consequently, the probabilities of eventually leaving 
the gadget at vertex and vertex S2 are given by p\{xi,X2) ■= /^^ ^^^i ^rid 

P2{xi,X2) := i3^^^^p2 respectively. Note that if xi = 0, then a is a Nash 

equilibrium where player 3 receives payoff < 1 - p < Hence, let 

tis assume that xi > and look for a Nash equilibrium where player 3 receives 



^Some authors define SqrtSum with < instead of >. With this definition, we would reduce 
from the complement of SqrtSum instead. 
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(0.0,0, f±l) 



1,0,0,0) 




(a) 



SiVn) 



Q(v) 



(1,0,0,1) 



1,1,0,0) 




(1,0,1,0) 



(b) 



(1,0,1,0) 



Figure 4. Reducing SqrtSum to StatNE. 



payoff > 1 — p. For this, it must be the case that pi{xi, xj) , P2{xi, X2) > j 
since otherwise player 1 or player 2 could improve her payoff by moving 
out of the gadget, where they would get payoff j immediately (and player 3 
would receive payoff < 1 — p). Vice versa, if pi{xi, X2) , P2{xi, X2) > \ then a 
is obviously a Nash equilibrium. Hence, to determine the maximum payoff for 
player 3 in a stationary Nash equilibrium, we have to maximise Jp2 / the 
expected payoff for player 3, imder the constraints pi{xi,X2), P2{xi,^i) > \ 
and < xi,X2 < 1. We claim that the maximum is reached only if xi = X2; 
if, for example, Xi > X2 then we can achieve a higher payoff for player 3 by 
setting x'2 := Xj, and the constraints are still satisfied: 



p{l 



p{l - xi) _ p(l -Xi) 



XiXjP^ 



XjXjP^ 



X2p2 



> 



fl 



1 — X1X2P 



Xi^ 1 
2 - 2 



Hence, in fact, we have to maximise l\ under the constraints f ^'^ > - 



and < X < 1, i.e. under p^^z _ 2px -(- 2p — 1 > and < x < 1. The roots 

9 9 „ „ - 1±a/2-2p , l+^/2-2p . , , , 

of p x-^ — 2px + 2p — 1 are — ^ — -, but — ^ — - is always greater than 1 



In fact. 



for p E [0, 1 ) . Hence, any solution must be less than x . ^ 
we always have < x < 1 for p G (jA)- Therefore, x is the optimal solution. 



and the maximal payoff for player 3 is indeed ^^^l 2 



^2-2p-p+l 
2p+2 



Finally, we can setup our reduction. Let {di, . . . ,dn,k) be an instance of 
SqrtSum where, without loss of generality, n > 0, rf, > for each i = 1, . . . ,n, 



and k < d := Y^f^^di. Define p, := 1 



2d2 



for i = 1, . . . , M. Note that 



Pi E [2, 1) since < d, < d < d^. For the reduction, we use n copies of the 
game G{p), where in the fth copy we set p to p,. The complete game Q is 



depicted in Figure 4 (a); it can obviously be constructed in polynomial time. 
By the above claim, the maximal payoff player 3 can get in a stationary 
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Nash equilibrium of (C/(p,),s) is 

Consequently, the maximal payoff player 3 can get in a stationary Nash 
equilibrium of Ci) is 

Ad^n 4d2 - di ~f-[Mn^f-[ Wn ~ f-[Mn^ Mn' 

Let us fix a stationary Nash equilibrium a of {Q,vi) with this payoff for 
player 3. 

Now, if Y.U Vdi>k, then also Y.U # + gSi > W' can be 

extended to a stationary Nash equilibrium of {Q,vq) where player wins 
almost surely by setting ^{vi \ vq) — 1. On the other hand, if E/Li < ^/ 

then also ElLi + gjj < in every stationary Nash equilibrium of 

{Q, Co) player 3 leaves the game at vq, which gives payoff to player 0. q.e.d. 



5 Undecidable variants of NE 

5.1 Pure-strategy equilibria 

In this section, we show that the problem PureNE is undecidable by exhibiting 
a reduction from an undecidable problem about two-counter machines. Our 
construction is inspired by a construction used by Brazdil & al. [2J to prove the 
imdecidability of stochastic games with branching-time winning conditions. 

A two-counter machine Ai is given by a list of instructions ti, . . . , /m 
where each instruction is one of the following: 

• "inc(7); goto k" (increment counter j by 1 and go to instruction num- 
ber k); 

• "zero(y) ? goto k : dec(/); goto /" (if the value of counter j is zero, go to 
instruction number k; otherwise, decrement coimter j by one and go to 
instruction number /); 

• "halt" (stop the computation). 

Here j ranges over 1, 2 (the two coimters), and k 7^ / range over 1, . . . ,m. A 
configuration of is a triple C = (f, Ci, C2) £ {1, ...,)«} x N x N, where i 
denotes the number of the current instruction and Cj denotes the current value 
of counter /. A configuration C' is the successor of configuration C, denoted 
by C h C', if it results from C by executing instruction a configuration 
C = {i,Ci,C2) with = "halt" has no successor configuration. Finally, the 
computation of Ai is the unique maximal sequence p = p{0)p{l) . . . such that 
jO(0) h p{l) h . . . and p{Qi) = (1,0,0) (the initial configuration). Note that p is 
either infinite, or it ends in a configuration C = (i,ci,C2) such that ii = "halt". 
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The halting problem is to decide, given a machine A4, whether the compu- 
tation of A4 is finite. It is well-known that two-counter machines are Turing 
powerful, which makes the halting problem and its dual, the non-halting 
problem, undecidable. 

Theorem 8. PureNE is undecidable. 



In order to prove Theorem 8 we show that one can compute from a 
two-counter machine M an SSMG (t/, cq) with nine players such that the 
computation of Ai is infinite iff {Q,vq) has a pure Nash equilibrium where 
player wins almost surely. This establishes a reduction from the non-halting 
problem to PureNE. 

The game Q is played player and eight other players and Bj, indexed 
by i E {1,2} and t E {0,1}. Let T = {init,inc(7),dec(7),zero(7) : ; = 1,2}. If 
A4 has instructions /i, . . . , then for each i e {1, . . . , wz}, each 7 £ T, each 
/' e {1,2} and each t e {0, 1}, the game Q contains the gadgets S|^, Ij^ and 



Cj ^, which are depicted in Figure 5 In the figure, squares represent terminal 
vertices (the edge leading from a terminal vertex to itself being implicit), and 
the labelling indicates which players win at the respective vertex. Moreover, 
the dashed edge inside Cj^ is present iff 7 ^ {init, zero(y)}. The initial 







init' 



vertex vq of Q is the black vertex inside the gadget 

For any pure strategy profile a of Q where player wins almost surely, let 
xqVq ~< xiVi ~< X2V2 -< ■ ■ ■ {xi e V*, V E V, xq = e) be the (unique) sequence 
of all consecutive histories such that, for each n E N, v„ is a black vertex 
and Pr^p(x„c„ ■ V^) > 0. Additionally, let 7o/7i/- ■ ■ be the corresponding 
sequence of instructions, i.e. 7,, = 7 for the unique instruction 7 such that v„ 
lies in one of the gadgets S^^ (where t = n mod 2). For each j E {1,2} and 
n E N, we define two conditional probabilities a" and p" as follows: 

aj := Pr^^(Reach(F^„^od2) | x„i;„ ■ V^) 

and 

p'^ := Pr^^(Reach(F^,,^od2) | x„z;„ ■ V'^ \ x„+2z;„+2 ■ V^). 

Finally, for each j E {1, 2} and n E N, we define an ordinal number c" < to 
as follows: After the history with probability g the play proceeds to the 
vertex controlled by player in the counter gadget Cj^ (where t = n mod 2). 
The number c" is defined to be the maximal number of subsequent visits to 
the grey vertex inside this gadget (where cJ = co if, on one path, the grey 
vertex is visited infinitely often). Note that, by the construction of Cj^, it 
holds that c" = if j„ = zero (7) or j„ = init. 

Lemma 9. Let a he a pure strategy profile of (t/, cq) where player wins 
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(04 I) 



(.0,1 I) 



(04 i) 




4 » *-i ^ 



1^ c 



■2.7 



i:,mc(7) 



if H = "mc(/); goto t"; 

^ s; 




fc,zero(;) 



if i, = "zero(/) ? goto i : dec(;); goto !"; 



(0 0) 

if H = "halt". 



C • 

St 



0,AJ,A^' 0,/lj,B| 0,A^',B^' 




if 7 = inc(/); 




if 7 = dec(;); 



0,A|,B| 0,A'.,AI 0,At,Bi 




if 7 ^ {inc(7),decO)}. 



Figure 5. Simulating a two-counter machine. 
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almost surely. Then u is a Nash equilibrium if and only if 

1 + c'l if 7„+i = inc(7), 
c" - 1 if 7«+i = dec(7), 
cj = if 7„+i = zero(y), 

c" otherwise 
■ 1 

for all j e {1, 2} and n e N. 

Here + and — denote the usual addition and subtraction of ordinal 



i 



numbers respectively (satisfying l + oj = cv — 1 = cv). The proof of Lemma 9 



goes through several claims. In the following, let (7 be a pure strategy profile 
of {G,V(j) where player wins almost surely. The first claim gives a necessary 
and sufficient condition on the probabilities a" for cf to be a Nash equilibrium. 

Claim. The profile a is a Nash equilibrium iff = 5 for all j E {1,2} and 



n e N. 

Proof. (^) Assume that a is a Nash equilibrium. Clearly, this implies that 
fl" > J for all M G IN since otherwise some player Aj could improve her 
payoff by leaving one of the gadgets S[^. Let 

bf ■= Pr^^(Reach(Fg„„,„d2) | x„Vn ■ V^). 

We have bj > I for all n e N since otherwise some player could improve 
her payoff by leaving one of the gadgets S[^. Note that at every terminal 
vertex of the counter gadgets Cj ^ and Cj ^ either player or player wins. 
The conditional probability that, given the history XnVn, we reach one of those 
gadgets is E/ceN ^ ■ | = 2 for all n e N, so we have = 2 — b" for all 
n e N. Since b" > g, we arrive at a" < 2 — g = 5/ which proves the claim. 

(■^=) Asstime that «" = 5 for all n e N. Clearly, this implies that none of 
the players Aj can improve her payoff. To show that none of the players Bj 
can improve her payoff, it suffices to show that b" > g for all n G N. But 
with the same argumentation as above, we have b" = ^ — and thus bJ = g 
for all n E N, which proves the claim. q.e.d. 

The second claim relates the probabilities a" and p". 

Claim. Let / e {1,2}. Then aj = i for all n e N if and only if pj = | for all 
n e N. 

Proof. (=^) Assume that aj = I for all n e N. We have aj = + I ■ aj+^ 
and therefore 5 = p" + n for all n e N. Hence, p" = ^ for all n e N. 

(^) Assume that pj = \ for all n G N. Since a^" = p^" + | ■ for all 
n G N, the ntimbers a" must satisfy the following recurrence: a""'"^ = 4fl" — 1. 
Since all the numbers a" are probabilities, we have < a" < 1 for all n e N. 
It is easy to see that the only values for cf^^ and fl| such that < fl" < 1 for all 
n e N are a'- = aj = j. But this implies that a" = 3 for all n e N. q.e.d. 
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Finally, the last claim relates the numbers p" to (1) 

Claim. Let ; e {1,2}. Then pj = | for all n e N if and only if (1) 
all n G N. 



holds for 



Proof. Let n e N, and let t = n mod 2. The probability p" can be expressed 
as the sum of the probability that the play reaches a terminal vertex that is 
winning for player inside Cj and the probability that the play reaches 
such a vertex inside Cj^ The first probability does not depend on j„, but 
the second depends on 7„+i. Let us consider the case that 7n+i = inc(7). In 
this case, the aforementioned sum is equal to the following sum of two binary 
numbers: 

0.00 1_^ 111 + 0.000 0^._0 100 . 

c" times cj'+' times 

Obviously, this sum is equal to | iff c""'"^ = 1 + c". For any other value of 
7„+i, the argumentation is similar, and we omit it here. q.e.d. 



Proof of ^emma 9 By the first claim, the profile a is a Nash equilibrium iff 



a" = J for all j E {1, 2} and n E N. By the second claim, the latter is true if 
3" = ^ for all j G {1,2} and n e N. Finally, by the last claim, this is the case 



iff (1) holds for all ; e {1,2} and n e N. q.e.d. 



To establish the reduction, it remains to show that the computation of 
Ai is infinite iff the game {G,Vo) has a pure Nash equilibrium where player 
wins almost surely. 

(=>) Assume that the computation p = p{0)p{l) ... of M is infinite. We 
define a pure strategy ctq for player as follows: For a history that ends in 
one of the instruction gadgets Ij^ after visiting a black vertex exactly n times, 
player tries to move to the neighbouring gadget y such that p{n) refers to 
instruction number k (which is always possible if |0(n — 1) refers to instruction 
number in any other case, ctq might be defined arbitrarily). In particular, 
if p{n — 1) refers to instruction /,■ = "zero(/) ? goto k : dec{j); goto /", then 
player will move to the gadget S^^^^^^^j if the value of the counter in 

configuration p{n — 1) is and to the gadget Sj ^^^^^j otherwise. For a history 
that ends in one of the gadgets Cj ^ after visiting a black vertex exactly n times 
and a grey vertex exactly m times, player will move to the grey vertex again 
iff m is strictly less than the value of the coimter j in configuration p{n — 1). So 
after entering Cj ^, player O's strategy is to loop through the grey vertex exactly 
as many times as given by the value of the counter j in configuration p{n — 1). 

Any other player's pure strategy is "moving down at any time". We 
claim that the resulting strategy profile £7 is a Nash equilibrium of (t/, cq) 
where player wins almost surely. 

Since, according to her strategy, player follows the computation of M, 
no vertex inside an instruction gadget P^^ where ij is the halt instruction 
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is ever reached. Hence, with probabiUty 1 a terminal vertex in one of the 
counter gadgets is reached. Since player wins at any such vertex, we can 
conclude that she wins almost surely. 

It remains to show that a is a Nash equilibrium. By the definition of 
player O's strategy ctq, we have the following for all n G N: 1. c" is the 
value of counter j in configuration p{n); 2. c"^^ is the value of counter j in 
configuration p{n + 1); 3. 7,,+! is the instruction corresponding to the counter 



update from configuration p{n) to p{n + 1). Hence, (1) holds, and cr is a Nash 



equilibrium by Lemma 9 



(■^=) Assume that a is a pure Nash equilibrium of (0,^0) where player 
wins almost surely. We define an infinite sequence p = p{0)p{l) . . . of pseudo 
configurations (where the counters may take the value cv) of A4 as follows. Let 
n E N, and assume that c„ lies inside the gadget S^-^ (where t = n mod 2); 
then jO(n) := (f, c^Cj). 

We claim that p is, in fact, the (infinite) computation of Ai. It suffices to 
verify the following two properties: 

1. p{0) = (1,0,0); 

2. p{n) h p{n + 1) for all n e N. 

Note that we do not have to show explicitly that each p{n) is a configuration 
of Ai since this follows easily by induction from 1. and 2. Verifying the first 
property is easy: vq lies inside jj^-^ (and we are at instruction 1), which is 
linked to the counter gadgets Cj^^jj and Cj^^^j. The edge leading to the grey 
vertex is missing in these gadgets. Hence, and are both equal to 0. 

For the second property, let p{n) = (f, ci,C2) ^rid p{n + 1) = 
(f, Cj,C2). Hence, c„ lies inside and v„^i inside S[, y for suitable 
7, 7' and t = n mod 2. We only proof the claim for the case that /, — 
"zero(2) ? goto k : dec(2); goto /"; the other cases are straightforward. Note 
that, by the construction of the gadget i'^, it must be the case that either i' = k 



and 7' = zero(2), or i' = I and 7' = dec(2). By Lemma 9 if 7' = zero(2), 
then C2 = C2 = and c[ = Cj, and if 7' = dec(2), then C2 = C2 — 1 and Cj = Ci. 
This implies p{n) h p{n + 1): On the one hand, if C2 = 0, then C2 7^ "^2 — 1/ 
which implies 7' 7^ dec(2) and thus 7' = zero(2), i' = k and = C2 = 0. On 
the other hand, if C2 > 0, then 7' 7^ zero(2) and thus 7' = dec(2), i' = I and 

C2 = C2 — 1. Q.E.D. 

5.2 Finite-state equilibria 



It follows from the proof of Theorem 8 that Nash equilibria may require 



infinite memory (even if we are only interested in whether a player wins with 
probability or 1). More precisely, we have the following proposition. 

Proposition 10. There exists an SSMG that has a pure Nash equilibrium 
where player wins almost surely but that has no finite-state Nash equilib- 
rium where player wins with positive probability. 
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Proof. Consider the game {Q, vq) constructed in the proof of Theorem 8 for the 
machine M consisting of the single instruction "mc(l); goto 1". We modify 
this game by adding a new initial vertex Ci which is controlled by a new 
player, player 1, and from where she can either move to vq or to a new 
terminal vertex where she receives payoff 1 and every other player receives 
payoff 0. Additionally, player 1 wins at every terminal vertex of the game Q 
that is winning for player 0. Let us denote the modified game by Q' . 

Since the computation of A4 is infinite, the game {G,vq) has a pure Nash 
equilibrium where player wins almost surely This equilibrium induces a 
pure Nash equilibrium of {G',Vi) where player wins almost surely. 

Now assume that there exists a finite-state Nash equilibrium of {Q',vi) 
where player wins with positive probability. Such an equilibrium induces a 
finite-state Nash equilibrium of {G,vq) where player 1, and thus also player 0, 
wins almost surely since otherwise player 1 would play to vq with probabil- 



ity 1. By Lemma 9 this implies that player updates the counter correctly 
However, since player uses a finite-state strategy, the corresponding counter 
values are bounded by a constant, a contradiction. q.e.d. 

Note that FinNE is recursively enumerable: To decide whether an 
SSMG {Q,Vo) has a finite-state Nash equilibrium with payoff > x and < y, 
one can just enumerate all possible finite-state profiles and check for each of 
them whether the profile is a Nash equilibrium with the desired properties by 
analysing the finite Markov chain that is generated by this profile (where one 
identifies states that correspond to the same vertex and memory state). Hence, 
to show the undecidability of FinNE, we cannot reduce from the non-halting 
problem but from the halting problem for two-counter machines (which is 
recursively enumerable itself). 

Theorem 11. FinNE is undecidable. 

Proof. The construction is similar to the one for proving undecidability of 
PureNE. Given a two-coimter machine A4, we modify the SSMG Q con- 



structed in the proof of Theorem 8 by adding another "coimter" (together 
with four more players for checking whether the counter is updated correctly) 
that has to be incremented in each step. Moreover, additionally to the termi- 
nal vertices in the gadgets Cj ^, we let player win at the terminal vertex in 
each of the gadgets I, -y where /, = "halt". Let us denote the new game by 
Q'. Now, if A4 does not halt, any pure Nash equilibrium of {G',vo) where 
player wins almost surely needs infinite memory: to win almost surely, 
player must follow the computation of M and increment the new coimter 
at each step. On the other hand, if M halts, then we can easily construct a 
finite-state Nash equilibrium of {G',Vo) where player wins almost surely. 
Hence, (^?',co) has a finite-state Nash equilibrium where player 1 wins almost 
surely iff the machine A4 halts. The details of the construction are left to the 
reader. q.e.d. 
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6 Conclusion 



We have analysed the complexity of deciding whether a simple stochastic 
multiplayer game has a Nash equilibrium whose payoff falls into a certain 
interval. Our results demonstrate that the presence of both stochastic vertices 
and more than two players makes the problem much more complicated than 
when one of these factors is absent. In particular, the problem of deciding the 
existence of a pure-strategy Nash equilibrium where player wins almost 
surely is undecidable for simple stochastic multiplayer games, whereas it is 
contained in NP H co-NP for two-player, zero-sum simple stochastic games 
IHII and even in P for non-stochastic infinite multiplayer games with, e.g., 
Biichi winning conditions ||23ll . 

Apart from settling the complexity of NE when arbitrary mixed strategies 
are considered, future research may, for example, investigate restrictions of 
NE to games with a small number of players. In particular, we conjecture that 
the problem is decidable for two-player games, even if these are not zero-sum. 
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